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1.  Introduction 


The  embedded-atom  method  (EAM),  developed  by  Daw  and  Baskes1,2  in  the  early 
1980s,  is  a  semiempirical  iV-body  potential  useful  for  the  atomistic  simulations  of 
metal  systems.  It  has  successfully  been  used  to  calculate  the  energetics  and  struc¬ 
tures  of  complex  metallic  systems  involving  free  surfaces,  defects,  grain  bound¬ 
aries,  etc.3  The  potential  was  later  modified  by  Baskes4,5  to  include  the  direction¬ 
ality  of  bonding  in  covalent  materials  such  as  silicon  and  germanium,6  leading  to 
the  MEAM5  introduced  in  1992.  It  has  undergone  several  modifications  and  en¬ 
hancements  since  then  to  include,  for  example,  second  nearest-neighbor  (2NN)  in¬ 
teractions7^9  and,  more  recently,  a  multistate  formalism.10  The  unique  feature  of 
the  MEAM  formalism  is  its  ability  to  reproduce  the  physical  properties  of  a  large 
number  of  face-centered  cubic  (FCC),9,1 1  body-centered  cubic  (BCC),8,12  hexagonal 
close-packed  (HCP), 13,14  and  diamond  cubic15  crystal  structures  in  unary,  binary, 
ternary,  and  higher  order16  metal  systems  with  the  same  semiempirical  formalism. 
MEAM  is  also  both  reliable  and  transferable17  in  the  sense  that  it  accurately  re¬ 
produces  the  physical  properties  of  the  element  or  alloy  (reliability)  and  performs 
reasonably  well  under  circumstances  other  than  the  ones  used  for  its  parameteriza¬ 
tion  (transferability).17 

The  MEAM  formalism  has  traditionally  been  used  for  pure  metals  and  impuri¬ 
ties,  binary  and  ternary  alloys,  and  hydride,  carbide,  and  nitride  metal  systems  with 
great  success,18  as  outlined  in  a  review  article  by  Horstemeyer.19  In  addition,  com¬ 
plex  nano  structured  systems  have  been  studied  using  various  MEAM-based  po¬ 
tentials.  For  example,  Xiao  et  al.20  calculated  the  interaction  of  carbon  nanotubes 
with  Ni  nanoparticles,  and  Uddin  et  al.21  recently  studied  the  mechanical  proper¬ 
ties  of  carbon  nanotube-Ni  composites  using  the  MEAM  potential.  The  MEAM 
formalism  has  been  extended  to  saturated  hydrocarbons  with  the  ultimate  aim  of 
capturing  the  energetics  and  geometries  of  commercially  important  hydrocarbon- 
based  polymers  (polyolefins)  such  as  polyethylene  and  polypropylene.  Potentials 
such  as  MM3, 22-24  MM4,25  DREIDING,26  first-27  and  second-generation  reactive 
empirical  bond  order  (REBO),28  reactive  force  field  (ReaxFF),29  charge-optimized 
many-body  (COMB)  potential,30,31  and  condensed-phase  optimized  molecular  po¬ 
tentials  for  atomistic  simulation  studies  (COMPASS)32  have  been  used  for  hydro¬ 
carbon  simulations,  but  of  these  potentials,  only  REBO,  ReaxFF,  and  COMB  are 
reactive  and  can  allow  for  bond  breaking.  Furthermore,  most  of  these  potentials  are 
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not  suitable  for  hydrocarbon-metal  systems,  with  only  RcaxFF33  36  and  COMB37 
having  been  used  in  the  past  to  study  hydrocarbon-metal  interactions.  Liang  et  al.38 
reviewed  the  use  of  reactive  potentials  for  advanced  atomistic  simulations. 

A  new  set  of  parameters  within  the  MEAM  framework  was  recently  developed 
to  describe  the  interactions  and  equilibrium  geometries  of  saturated  hydrocarbons, 
specifically  bond  distances,  bond  angles,  and  atomization  energies  at  0  K.  It  was 
shown  that  MEAM  gives  a  comparable  or  more  accurate  reproduction  of  these  prop¬ 
erties  relative  to  experimental  and  first-principles  data  in  comparison  with  REBO 
and  ReaxFF.  It  was  also  able  to  reproduce  the  potential  energy  curves  of  H2,  CH, 
and  C2  diatomics  and  (H2)2,  (CH4)2,  (C2H6)2,  and  (C3H8)2  dimer  configurations 
and  predict  the  pressure- volume-temperature  (PVT)  relationships  of  a  series  of  se¬ 
lect  methane,  ethane,  propane,  and  butane  systems  in  a  reasonable  agreement  with 
the  experimental  data.  The  energetics  of  C-H  and  C-C  bond  breaking  in  methane 
and  ethane  and  the  heat  of  reaction  for  select  chemical  reactions  have  also  been 
explored,  and  it  was  found  that  MEAM  gives  reasonable  predictions  of  the  energies 
associated  with  these  2  major  chemical  reactions  in  saturated  hydrocarbons. 

The  present  work  is  associated  with  the  initial  development  of  the  C-H  MEAM 
potential.39  In  parameterizing  this  potential,  a  number  of  other  studies  were  per¬ 
formed  to  understand  the  relative  influence  of  various  parameters  on  the  training 
set  of  properties  for  these  different  molecules  and  configurations.  Moreover,  further 
analysis  can  shed  light  on  the  relationship  between  different  properties  and  the  sim¬ 
ilarity/dissimilarity  between  different  molecules  within  the  training  set.  In  doing 
these  analyses,  reduced  sampling  strategies  were  investigated,  including  fractional 
factorial  designs  and  Latin  hypercube  sampling  designs.  A  few  of  these  studies  and 
their  analyses  are  presented  herein  as  methods  and  tools  for  both  understanding  and 
developing  interatomic  potentials.  This  work  builds  upon  prior  studies  by  the  vari¬ 
ous  authors. 39-41  This  report  is  organized  in  the  following  manner.  In  Section  2  the 
theory  of  the  MEAM  formalism  is  reviewed.  In  Section  3,  the  simulation  results  for 
the  2  studies  is  presented  and  analyzed.  Last,  the  summary  follows  in  Section  4. 
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2.  Simulation  Methodology 


2.1  Modified  Embedded-Atom  Method  Theory 

In  the  EAM  and  MEAM  formalisms1-2  5  the  total  energy  of  a  system  of  atoms  (Etot) 
is  given  by 


(1) 


where  Fn  is  the  embedding  energy  function  for  element  type  t,  ,  which  is  defined  as 
the  energy  required  to  embed  an  atom  of  element  type  t*  in  the  background  electron 
density  pi  at  site  i,  Sij  is  the  screening  factor  between  atoms  at  sites  i  and  j  (defined 
in  Eqs.  25  and  28),  and  (j)TiTj  is  the  pair  interaction  between  atoms  of  element  types 
Tj  and  Tj  at  sites  i  and  j  at  the  separation  distance  of  R,j .  To  emphasize  the  multi- 
component  nature  of  the  model,  the  element  type  of  the  atom  at  site  i  is  denoted 
as  Tj  in  this  manuscript  to  distinguish  it  from  site  designation  i,  and  the  screening 
factor  is  explicitly  separated  from  the  pair  potential.  The  following  subsubsections 
describe  the  calculation  of  the  embedding  energy  function  and  screening  factor  for 
the  MEAM  formalism. 

2.1.1  Embedding  Energy  Function 

The  embedding  function  is  given  by  the  specific  simple  form 


if  Pi  >  0 


(2) 


if  Pi  <  0 


where  AT.  is  a  scaling  factor,  E®  is  the  sublimation  (cohesive)  energy,  and  is  the 
background  electron  density  for  the  reference  structure  of  the  atom  of  element  type 
t?;  at  site  i.  For  most  elements,  the  reference  structure  is  the  equilibrium  structure 
of  the  element  in  its  reference  state.  However,  the  reference  structure  of  carbon  is 
taken  as  diamond.  We  will  denote  the  properties  of  the  equilibrium  reference  state 
with  a  superscript  zero.  The  analytic  continuation  of  the  embedding  function  for 
negative  electron  densities  was  considered  as  a  computational  convenience  to  pre- 
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vent  systems  from  entering  this  unphysical  regime.  The  origin  of  negative  electron 
densities  arises  in  Eq.  8.  The  MEAM  formalism  introduces  directionality  in  bond¬ 
ing  between  atoms  through  decomposing  pi  into  spherically  symmetric  (pf})  and 
angular  (pf  \  p[2\  and  pf] )  partial  electron  densities5, 18,42  as  given  by 
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prf  }  (h  =  {0, 1,  2,  3})  indicate  the  atomic  electron  densities  from  atom  of  element 
type  Tj  at  site  j  at  distance  R%3  from  site  i.  R3j,  R3j,  and  RY  represent  the  a,  f3, 
and  7  components  of  the  distance  vector  between  atoms  at  sites  i  and  j,  respec¬ 
tively,  and  trf  (h  =  {1,2,3})  are  adjustable  element-dependent  parameters.  The 
equivalence  between  these  expressions  and  an  expansion  in  Legendre  polynomials 
has  been  discussed  previously.5  As  previously  discussed,  we  have  carefully  denoted 
the  element  types  of  the  atoms,  and  separated  the  screening  from  the  atomic  elec¬ 
tron  densities.  Equation  3  is  the  simple  linear  superposition  of  atomic  densities  of 
the  EAM  formalism,1,2  and  Eqs.  4-6  reduce  to  more  familiar  forms  in  the  origi¬ 
nal  MEAM  paper  by  Baskes5  for  a  single-component  system.  The  partial  electron 
densities  can  be  combined  in  different  ways  to  give  the  total  background  electron 
density  at  site  i  (pi).  Here,  we  adopt  one  of  the  most  widely  used  forms,1617,43  which 
is  given  by 


Pi 


pTg(  ro 


(7) 


G(r4) 


\/i  +  Tj  if  r i  >  —  1 

— \/|  i  +  |  if  r,  <  —  1 


(8) 


r.  =  E‘1 


+ih) 


h= 1 


Pi 


(■ h ) 


P: 


(0) 


(9) 


xW 


(10) 


In  the  absence  of  angular  contributions  to  the  density,  V,  =  0,  GiY,)  =  1,  the  model 
reduces  to  the  EAM  formalism.  For  systems  with  negative  values  in  certain 
geometries,  r*  <  -1,  and  for  computational  convenience,  we  perform  an  analytic 
continuation  of  G  (T*).  We  choose  to  do  this  by  allowing  G  (T*)  and,  hence,  pi  to 
become  less  than  zero. 

If  we  apply  Eqs.  7  and  9  to  the  equilibrium  reference  structure,  we  obtain 
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fr  =  Z°rP°rG  (K)  , 


(11) 


r 


0 


(12) 


where  we  have  assumed  that  the  reference  structure  has  only  first  nearest-neighbor 
(INN)  interactions.  In  Eq.  11,  is  an  element-dependent  electron  density  scaling 
factor,  and  Z °  is  the  INN  coordination  number  of  the  reference  structure.  .sv'j  (h  = 
(1,  2,  3})  are  “shape  factors”  that  depend  on  the  reference  structure  for  element  type 
r.  The  shape  factors  are  given  in  the  original  MEAM  paper  by  Baskes.5  The  atomic 
electron  density  for  element  type  r  is  calculated  from 


P‘R  (R)  =  P> 


~P 


(■ h ) 


(13) 


where  (h  =  (0,1,  2,  3})  are  adjustable  element-dependent  parameters,  and 
R®  is  the  nearest-neighbor  distance  in  the  equilibrium  reference  structure  for  the 
element  type  r. 

The  pair  interaction  for  like  atoms  of  element  type  r  can  be  calculated  using  a  INN5 
or  2NN7,18  formalism.  In  this  work,  the  former  is  used1617  and  is  given  by 


<t>TT  (R)  =-§o{K  (R)  -  Fr  W  (*)]  }  •  (14) 

In  this  equation  prTef  (R)  is  the  background  electron  density  in  the  reference  struc¬ 
ture  evaluated  from  Eqs.  7-10  at  a  nearest-neighbor  distance  of  R  and  is  given  by 


K”  (R)  =  z°p°g  (rp) , 


(15) 
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r  rTef  = 
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a(h) 


h=  1 


AV 


a(0) 


(16) 
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and  £y  is  the  universal  equation  of  state  (UEOS)  of  Rose  et  al.44  for  element  type 
r  given  by 


or 


E “  (f?)  =  -£° 


R° 

l  +  a*  +  5-^  (a*Y 
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5“  if  a*  >  0 
if  a*  <  0 


(19) 
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In  Eqs.  17-21,  K®,  k%  and  are  the  bulk  modulus,  spring  constant,  and  the  atomic 
volume  of  the  reference  structure,  respectively,  and  S  is  an  adjustable,  element- 
dependent  parameter  that  has  2  components,  attractive  and  repulsive  SL  Equa¬ 
tion  20  is  used  when  the  reference  structure  is  a  3-dimensional  crystal,  and  Eq.  21 
is  used  when  the  reference  structure  is  a  diatomic. 

The  pair  interaction  for  unlike  atoms  of  element  types  r  and  v  is  similarly  obtained 
from  the  reference  structure  of  the  unlike  atoms.  For  this  work,  the  reference  struc¬ 
ture  is  taken  as  the  heteronuclear  diatomic,  which  gives 


(f>TV  (-R) 


{2E“,  ( R )  -  Ft  K  (fl)]  -  F„  \jS>T  («)]  }  , 


(22) 


where  =  1  is  the  coordination  number  for  the  diatomic  and 
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pi  (R)  =  pT]G  (rj) , 


(23) 


r 


d 

T 


3 


h=  1 


(24) 


where  the  shape  factors  s'/°  are  those  for  a  diatomic.  The  UEOS  7f"  ,  is  given  by 
Eqs.  17-21  using  parameters  E®v,  R°TV,  k®v,  5°v,  and  (?TV. 

2.1.2  Screening  Factor 

The  screening  factor  SV,  is  defined  as  the  product  of  all  screening  factors  Slk],  where 
the  interaction  between  atoms  at  sites  i  and  j  are  screened  by  neighboring  atoms  at 
site  k  as  given  by 


n 

k=£i,j 


s, 


ikj  ■ 


(25) 


If  it  is  assumed  that  all  3  sites  i,  j,  and  k  lie  on  an  ellipse  on  the  xy-planc  with  sites 
i  and  j  on  the  x-axis,  the  following  equation  can  be  derived: 


2  ,  1  2 
x  +  -^y 


(26) 


where 


Cikj 


2  (Xjk  +  Xkj)  —  ( Xjk  —  Xkjf_  ~  1 
1  -  (X~k  -  Xkj)2 


(27) 


In  Eq.  27,  Xlk  =  (/?,/.■/ R,j)2  and  Xkj  =  ( Rk:j / R^)2 .  The  screening  factor  Sikj  for 
like  atoms  is  defined  as 


Sikj  =  fc 


Cikj  Crnin.  ( R i  Tfc,  T j ) 

Cmax  (Xii  Xki  Xj  )  Crain  (Xi-  Xk  •  X j  )  J 


(28) 
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where  Crnvn  (r,:,  rk,Tj)  and  Cmax  (Ti,Tk,Tj)  determine  the  extent  of  screening  of 
atoms  of  element  type  r  at  sites  i  and  j  by  an  atom  at  site  k.  Similar  expressions 
can  be  written  for  the  screening  of  unlike  atoms.  The  smooth  cutoff  function  fc  is 
defined  as 


{1  if  x  >  1 

[1  — (1— a;)4]2  if  0  <  a:  <  1  •  (29) 

0  if  x  <  0 

Sij  =  1  means  that  the  interaction  between  atoms  at  sites  i  and  j  is  not  screened, 
while  =  0  means  the  interaction  is  completely  screened. 

2.1.3  Modified  Embedded-Atom  Method  Parameterization 

The  MEAM  formalism  presented  in  Eqs.  1-29  requires  16  independent  model  pa¬ 
rameters  for  each  element  type  r  (i.e.,  E®,  R®,  a1/,  <5“,  and  d1!  for  the  UEOS  [Eq.  17]; 
/3t°\  /3^\  /3t2\  ^t  \  t^r  \  t?\  t?'1,  and  for  the  electron  densities  [Eqs.  3-13];  Ar 
for  the  embedding  function  FT  [Eq.  2];  and  C'mm  and  Crnax  for  the  screening  factor 
[Eqs.  25-29]).  In  the  current  MEAM  formalism  for  a  single  element,  the  model  is 
independent  of  p°;  hence,  px  =  1  is  taken  for  one  of  the  elements.  For  a  diatomic 
composed  of  elements  r  and  v,  13  additional  independent  parameters  are  required 
(i.e.,  Exv,  R°TV,  a%,  d%,  and  8rTV,  4  Cmin,  and  4  Cmax  values).  A  small  molecule 
database  was  used  in  the  fitting  of  the  C-H  MEAM  interatomic  potential.  Further 
details  of  the  parameterization  are  provided  in  Nouranian  et  al.39  The  simulations 
were  run  using  the  molecular  dynamics  code,  DYNAMO.45 

2.2  Fractional  Factorial  Design 

A  design  of  experiments  approach  was  used  to  assess  the  influence  of  each  pa¬ 
rameter  on  the  small  molecule  database  that  was  used  for  tuning  the  C-H  MEAM 
interatomic  potential.  A  fractional  factorial  design  was  chosen  to  identify  the  signif¬ 
icant  main  effects  on  a  large  number  of  responses:  molecular  atomization  energies, 
average  bond  lengths  (C-C,  C-H,  H-H),  average  bond  angles  (C-C-C,  C-C-H, 
and  H-C-H),  average  dihedral  angles  (C-C-C-C,  C-C-C-H,  H-C-C-H),  and  dif¬ 
ference  in  the  energy  vs.  distance  relationship  from  first  principles  calculations  for 
several  molecules  (2C,  2H,  C-H,  CH4,  and  C2H6)  and  paths.  Table  1  shows  an  ex¬ 
ample  of  a  fractional  factorial  experiment  for  5  factors  (in  this  study,  we  used  26 
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parameters)  with  2  levels  (—1  and  +1).  The  25  factorial  is  a  full  factorial  design 
and  requires  32  runs  to  test  all  combinations.  For  fractional  factorial  designs,  only 
a  fraction  of  these  runs  are  used,  which  results  in  certain  effects  being  confounded 
with  each  other  (e.g.,  it  may  not  be  possible  to  understand  whether  a  response  is 
due  to  factor  a  or  due  to  an  interaction  between  factor  b  and  factor  c).  The  reso¬ 
lution  of  a  fractional  factorial  design  details  the  extent  of  confounding  within  the 
design.  In  Table  1,  the  25-1  fractional  factorial  design  is  a  Resolution  IV  design, 
where  the  main  effects  are  not  confounded  with  any  other  main  effects  or  2-factor 
interactions,  but  the  2  factor  interactions  are  confounded  with  each  other.  The  25-2 
fractional  factorial  design  is  a  Resolution  III  design  that  only  requires  8  runs  (one- 
quarter  of  a  full  factorial  design)  but  where  the  main  effects  may  be  confounded 
with  2-factor  interactions.  In  the  present  study,  a  226-20  fractional  factorial  design 
is  used  (Table  2),  which  requires  64  runs.  This  factorial  design  is  a  Resolution  IV 
design,  so  the  main  effects  are  unconfounded  with  each  other  or  with  2-factor  in¬ 
teractions,  providing  information  about  which  parameters  significantly  affect  the 
responses  being  examined.  Table  3  shows  the  26  MEAM  parameters  used  in  the 
present  study  and  the  —1  and  +1  levels  for  each  factor. 
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Table  1  Example  of  fractional  factorial  designs  (25  2  and  25  1  designs)  vs.  a  full  factorial 
design  (25  design)  for  5  factors:  a,  b ,  c,  d,  e 


25  2  Factorial  Design  25  1  Factorial  Design  25  Factorial  Design 
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Table  3  The  26  MEAM  parameters  and  their  assigned  values  in  the  226  20  fractional  facto¬ 
rial  design 


Xi 

Parameter 

-i 

+  1 

Xi 
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(alphal) 
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(bOl) 
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0.06 
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The  next  step  is  to  assign  the  interatomic  potential  parameters  to  the  26  factors  and 
to  select  appropriate  numeric  values  for  the  potential  parameters  (i.e.,  the  —1  and 
+1  factor  levels).  In  the  following  study,  the  MEAM  parameters  identified  in  prior 
work39  were  used  as  baseline  values  and  the  —1  and  +1  factor  levels  were  chosen 
to  bracket  these  values. 

2.3  Tests:  Molecules  and  Property 

The  last  step  is  to  define  the  responses  used  for  the  present  study.  Figure  1  shows 
an  example  of  some  of  the  molecules  used  for  parameterization  of  the  potential 
as  well  as  used  herein.  The  molecule  set  represents  a  diversity  of  potential  struc¬ 
tures  for  the  C-H  MEAM  potential:  dimers,  alkanes,  alkenes,  alkynes,  cycloalka¬ 
nes,  aromatics,  and  radicals,  some  of  which  are  depicted  in  Fig.  1.  The  various 
properties  explored  for  each  molecule  include  0  K  energy,  C-C  and  C-H  bond  dis¬ 
tances,  C-C-C,  C-C-H,  and  H-C-H  bond  angles,  and  C-C-C-C,  C-C-C-H,  and 
H-C-C-H  dihedral  angles.  For  the  case  of  molecules  with  multiple  values  for  bond 
distances,  bond  angles,  and  dihedral  angles,  the  average  value  for  all  bond/angle/di¬ 
hedral  types  of  that  nature  are  used.  Another  property  that  is  quantified  within  is  the 
energy  vs.  distance  relationship  for  a  few  dimers  and  small  molecules.  For  example, 
Fig.  2  depicts  2  C2H6  molecules  that  are  oriented  in  different  ways  with  respect  to 
each  other.  Each  of  these  molecules  is  then  separated  by  different  distances  while 
constraining  the  atoms  and  the  energy  is  calculated.  This  type  of  calculation  is  per¬ 
formed  for  H2,  CH4,  and  C2H6,  and  was  compared  to  first  principles  results  as 
discussed  in  Nouranian  et  al.39 
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CH3C(CH3)2CH2CH2CH3  CH3C(CH3)2CH2CH3  CH3C(CH3)2CH3 


ch3ch2chch3ch2ch3  ch3chch3ch2ch2ch2ch3 

Fig.  1  Example  of  some  of  the  molecules  used  in  the  parameterization  of  the  C-H  MEAM 
potential  and  in  the  property  assessment  in  this  study 
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w  w 

2C2H6-1 

J ^ 

2C2H6-3 


2C2H6-2 


2C2H6-4 


Fig.  2  Example  of  some  of  the  molecule  configuration  paths  used  for  the  energy-distance  rela¬ 
tionships  (as  distance  is  increased  in  horizontal  direction),  which  was  then  used  in  the  param¬ 
eterization  of  the  C-H  MEAM  potential  and  in  property  assessment  in  this  study 


3.  Simulation  Results 

3.1  Analysis  of  Variance 

Factorial  or  fraction  factorial  designs  can  be  analyzed  by  describing  the  response 
variables  in  terms  of  an  overall  mean,  a  contribution  from  the  treatment  (i.e.,  due 
to  the  different  factors  and  factor  levels),  and  a  contribution  from  random  error — 
in  the  case  of  minimum  energy  molecules  herein,  the  responses  for  each  set  of 
parameters  is  unique  and  so  only  one  simulation  is  required.  The  variability  in  the 
responses  is  a  function  of  the  treatment  and,  hence,  the  various  factors  and  factor 
levels.  So,  an  effective  method  for  analyzing  factorial  experiments  is  to  measure 
the  total  deviation  from  the  mean  response  (i.e.,  sum  of  squares)  and  partition  this 
into  the  different  factors  (i.e.,  the  analysis  of  variance  [ANOVA]  method).  This 
partitioning  of  the  variability  into  the  different  factors  enables  testing  the  hypothesis 
of  whether  any  particular  factor  has  a  significant  influence  on  the  total  variability; 
this  is  often  expressed  through  an  F  ratio  test  statistic  and  a  p- value. 

The  ANOVA  technique  is  performed  for  the  energy  of  the  CH4  molecule  in  Table  4 
to  illustrate  a  summary  of  the  information  obtained.  Again,  recall  that  the  facto¬ 
rial  design  in  Table  2  describes  the  values  for  all  the  MEAM  parameters  for  the 
64  different  simulations  (treatments)  of  the  CH4  molecule.  The  CH4  molecule  is 
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minimized  using  a  gradient-based  energy  minimization  technique,  and  the  final  en¬ 
ergy  and  positions  of  the  atoms  are  used  for  various  responses:  atomization  energy, 
C-H  bond  length,  and  H-C-H  bond  angle.  The  data  from  the  ANOVA  procedure 
are  listed  in  Table  4,  which  includes  several  columns  of  data:  1)  the  total  sum  of 
squares  as  well  as  the  sum  of  squares  attributed  to  each  factor  and  the  error;  2)  the 
degrees  of  freedom  (d.f.),  which  is  equal  to  1  for  all  factors;  3)  the  mean  squares 
of  the  various  factors  and  the  error,  which  is  an  unbiased  estimate  of  the  variance; 
4)  the  F  test  statistic,  which  is  a  ratio  of  the  mean  square  of  the  factors  to  that  of 
the  error;  and  5)  the  p-value,  which  is  related  to  the  hypothesis  that  there  are  dif¬ 
ferences  in  the  variance  due  to  a  particular  factor.  There  are  a  few  important  results 
that  can  be  gleaned  from  this  information.  First,  the  p-value  indicates  which  factors 
are  significant;  a  value  below  0.05  indicates  a  95%  confidence  level  for  significance. 
Hence,  in  terms  of  the  energy  of  this  molecule,  there  are  7  statistically  significant 
factors  at  a  95%  confidence  level:  esubl,  asubl,  b02,  alat2,  esub2,  deltas  12,  and 
res  12.  Second,  the  sum  of  squares  column  indicates  the  degree  of  influence  of  each 
parameter  and  dividing  by  the  total  sum  of  squares  gives  the  percent  contribution 
for  each  factor  to  the  response.  So,  the  percent  contribution  of  each  statistically  sig¬ 
nificant  factor  can  be  calculated:  49.7%  (deltasl2),  22.1%  (esub2),  17.5%  (esubl), 
and  9.1%  (asubl)  (i.e.,  98.3%  of  the  variability  in  this  study  is  caused  by  these  4 
factors).  Another  way  of  interpreting  this  is  that  a  linear  regression  model  is  fit  using 
the  various  parameters  outlined  in  Table  4,  and  the  sum  of  squares  column  is  simply 
a  reflection  of  how  much  of  the  original  variability  in  properties  is  accounted  for  by 
each  parameter.  In  this  manner,  the  error  term  gives  the  ability  to  describe  the  data 
using  this  linear  regression  model;  in  truth,  if  a  predictive  model  is  the  goal,  then 
including  parameter  interaction  terms  or  other  mathematical  model  formulations 
may  better  capture  the  relationship  between  parameters  and  responses.  This  can  be 
easily  accomplished  but  is  outside  the  scope  of  this  work. 
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Table  4  ANOVA  analysis  of  energy  of  CH4  molecule 


Source 

Sum  Sq. 

d.f. 

Mean  Sq. 

F 

p- value  (Prob>  F ) 

meam.  alpha  1 

0.000 

1 

0.000 

0.000 

1.000 

meam.bOl 

0.000 

1 

0.000 

0.000 

0.999 

meam.bll 

0.000 

1 

0.000 

0.000 

0.998 

meam.b21 

0.000 

1 

0.000 

0.000 

0.992 

meam.b31 

0.000 

1 

0.000 

0.000 

0.988 

meam.alatl 

0.000 

1 

0.000 

0.000 

0.987 

meam.esubl 

31.065 

1 

31.065 

16,413.814 

0.000 

meam.asubl 

16.099 

1 

16.099 

8,506.316 

0.000 

meam.alpha2 

0.000 

1 

0.000 

0.005 

0.946 

meam.b02 

0.016 

1 

0.016 

8.591 

0.006 

meam.bl2 

0.001 

1 

0.001 

0.717 

0.402 

meam.b22 

0.002 

1 

0.002 

0.813 

0.373 

meam.b32 

0.000 

1 

0.000 

0.036 

0.850 

meam.alat2 

1.895 

1 

1.895 

1,001.323 

0.000 

meam.esub2 

39.280 

1 

39.280 

20,754.279 

0.000 

meam.asub2 

0.000 

1 

0.000 

0.000 

0.991 

meam.rozero2 

0.000 

1 

0.000 

0.004 

0.950 

parameter.deltas  1 2 

88.377 

1 

88.377 

46,695.244 

0.000 

parameter.resl2 

0.997 

1 

0.997 

526.704 

0.000 

parameter,  alphas  1 2 

0.000 

1 

0.000 

0.086 

0.771 

parameter.attracl  1 

0.000 

1 

0.000 

0.004 

0.949 

parameter.repuls  1 1 

0.000 

1 

0.000 

0.000 

0.999 

parameter.attrac  1 2 

0.000 

1 

0.000 

0.113 

0.739 

parameter.repuls  12 

0.000 

1 

0.000 

0.000 

0.998 

parameter.  attrac22 

0.000 

1 

0.000 

0.000 

0.999 

parameter.repuls22 

0.000 

1 

0.000 

0.000 

0.999 

Error 

0.070 

37 

0.002 

Total 

177.803 

63 

3.2  Interatomic  Potential  Parameter  Sensitivity  via  Analysis  of  Variance 

The  present  study  contains  30  different  tests,  which  each  have  a  number  of  different 
responses  that  can  be  evaluated  against  the  factors  explored  in  the  factorial  design 
using  an  ANOVA  procedure  similar  to  that  illustrated  in  Table  4.  The  large  number 
of  responses  included  in  the  present  study  required  exploring  different  techniques 
for  visualizing  and  analyzing  the  information.  One  way  to  examine  this  informa- 
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tion  is  to  use  a  contour  plot  to  visualize  how  the  percent  contribution  for  each  factor 
changes  for  each  molecule.  This  type  of  plot  is  shown  in  Fig.  3,  which  depicts 
how  the  percent  contribution  to  the  energy  for  each  factor  changes  as  a  function  of 
molecule  type.  On  the  horizontal  axis  are  the  30  different  molecules  in  the  present 
study,  ranging  from  C2  to  C2H5.  On  the  vertical  axis  are  the  26  different  MEAM 
parameters.  The  contours  are  related  to  the  percent  contribution.  The  bands  of  in¬ 
creased  percent  contribution  indicate  the  parameter  sensitivity  for  the  properties  (in 
this  case,  molecule  energy)  (i.e.,  visually  depicting  the  relative  influence  of  param¬ 
eters  for  tuning  a  particular  property).  In  general,  the  deltas  12  term  (EqH)  has  the 
largest  influence  for  most  molecules,  followed  by  esubl  (Eq)  and  esub2  (E^),  as 
might  be  expected  from  the  formulation  of  the  potential  (i.e.,  these  parameters  de¬ 
fine  the  sublimation/cohesive  energies  of  the  reference  state  for  C,  H,  and  C-H).  The 
bands  indicate  that  these  parameters  affect  the  molecule  energies  in  a  similar  man¬ 
ner  despite  the  very  different  geometries  associated  with  the  different  molecules. 
The  C  and  H  dimers  are  heavily  influenced  by  asubl  (Ac)  and  esub2  (E{]H).  The 
fact  that  the  H  dimer  energy  is  completely  influenced  by  is  no  surprise;  the 
reference  state  for  the  H  MEAM  potential  is  the  dimer.  However,  interestingly,  the 
C  dimer  energy  is  influenced  much  more  by  Ac  (scaling  factor  for  the  background 
electron  density  of  C,  Eq.  2)  rather  than  Eq,  which  is  due  to  the  diamond  cubic 
reference  state  for  C  as  opposed  to  the  dimer  reference  state. 

The  data  displayed  on  the  subsequent  contour  plots  were  computed  in  the  same 
manner,  but  with  bond  distances  (Figs.  4  and  5),  bond  angles  (Figs.  6-8),  dihedral 
angles  (Figs.  9-11),  and  the  energy  vs.  distance  curves  for  various  molecules  and 
paths  (Figs.  12  and  13). 

Figures  4  and  5  show  the  contribution  of  the  MEAM  parameters  to  the  C-H  and 
C-C  bond  distances,  respectively.  The  bond  distances  are  most  influenced  by  res  12 
(B°ch),  asubl  (Ac),  and  alatl  (R{)c).  The  C-H  bond  distance  is  affected  by  R?c H,  a 
parameter  associated  with  the  distance  of  the  C-H  reference  state  (dimer),  and  Ac, 
the  scaling  factor  for  the  electron  density.  The  C-C  bond  distance  is  mainly  affected 
by  R{)c  for  all  molecules  with  a  minor  effect  of  Ac  for  C2  and  C2H2. 
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Fig.  3  Summary  of  ANOVA  results  for  the  energies  of  all  molecules  examined.  The  percent 
contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the  sum  of  squares 
attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4),  which  was  repeated 
for  each  molecule. 
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Fig.  4  Summary  of  ANOVA  results  for  the  C-H  bond  distances  of  all  molecules  examined.  The 
percent  contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the  sum 
of  squares  attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4),  which 
was  repeated  for  each  molecule.  The  mean  bond  distance  for  all  C-H  bonds  was  used  for  those 
molecules  with  multiple  C-H  bonds. 
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Fig.  5  Summary  of  ANOVA  results  for  the  C-C  bond  distances  of  all  molecules  examined. 
The  percent  contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the 
sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4), 
which  was  repeated  for  each  molecule.  The  mean  bond  distance  for  all  C-C  bonds  was  used 
for  those  molecules  with  multiple  C-C  bonds. 
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Figures  6-8  show  the  contribution  of  the  MEAM  parameters  to  the  C-C-C,  C-C-H, 
and  H-C-H  bond  angles,  respectively.  In  contrast  to  the  energies  and  bond  distances 
of  the  various  molecules,  multiple  MEAM  parameters  contribute  to  the  different 
bond  angles  and  these  contributions  change  as  a  function  of  the  molecule  type. 
In  general,  the  bond  angles  are  mainly  influenced  by  asubl  {Ac),  alatl  {R°c),  at- 
trac22  (6%),  asub2  (Ah),  alat2  (. R°H ),  bl2  {($)■>  and  resl2  {R°CH).  Interestingly,  the 
relationship  between  MEAM  parameters  and  the  bond  angles  are  not  always  as  in¬ 
tuitive.  For  instance,  the  variation  in  the  C-C-C  bond  angle  was  mainly  influenced 
by  modifications  to  the  H  MEAM  parameters  (i.e.,  R°H  and  8%),  which  also  played 
a  significant  role  in  the  C-C-H  and  H-C-H  bond  angles.  There  are  also  significant 
shifts  between  different  molecule  families  (alkanes  vs.  cycloalkanes  vs.  polyolefins) 
in  terms  of  the  significance  of  these  various  MEAM  parameters.  That  is,  the  effects 
to  these  bond  angles  in  one  molecule  is  not  necessarily  correlated  to  effects  in  an¬ 
other  molecule;  this  is  further  examined  by  correlating  properties  for  all  the  various 
molecules. 

Figures  9-11  show  the  contribution  of  the  MEAM  parameters  to  the  C-C-C-C, 
C-C-C-H,  and  H-C-C-H  dihedral  angles,  respectively.  Similarly  to  the  bond  an¬ 
gles,  the  dihedral  angles  are  affected  by  a  number  of  the  same  MEAM  parameters: 
asubl  {Ac),  alatl  {R°c),  attrac22  {5aH),  asub2  {AH),  alat2  {R°H),  bl2  {/3^),  and  resl2 
{R°ch)-  Again,  interestingly,  some  of  the  strongest  influences  to  the  C-C-C-C  dihe¬ 
dral  angles  are  the  contributions  of  the  single  element  H  MEAM  potential  {8jj,  Ah, 
(3$)  in  addition  to  some  of  the  single  element  C  and  C-H  interaction  MEAM  pa¬ 
rameters.  In  general,  the  attraction  term  for  hydrogen  8^  has  a  large  influence  on  all 
of  the  dihedral  angles.  Some  of  the  trends  in  MEAM  parameter  sensitivities  change 
with  molecule  as  well.  For  instance,  the  H-C-C-H  dihedral  angle  is  significantly 
affected  by  R°c  for  the  alkane  series  of  molecules  with  greater  than  3  C  atoms. 
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Fig.  6  Summary  of  ANOVA  results  for  the  C-C-C  bond  angles  of  all  molecules  examined. 
The  percent  contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the 
sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4), 
which  was  repeated  for  each  molecule.  The  mean  bond  angle  for  all  C-C-C  bond  angles  was 
used  for  those  molecules  with  multiple  C-C-C  bond  angles. 
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Fig.  7  Summary  of  ANOVA  results  for  the  C-C-H  bond  angles  of  all  molecules  examined. 
The  percent  contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the 
sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4), 
which  was  repeated  for  each  molecule.  The  mean  bond  angle  for  all  C-C-H  bond  angles  was 
used  for  those  molecules  with  multiple  C-C-H  bond  angles. 
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Fig.  8  Summary  of  ANOVA  results  for  the  H-C-H  bond  angles  of  all  molecules  examined. 
The  percent  contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the 
sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4), 
which  was  repeated  for  each  molecule.  The  mean  bond  angle  for  all  H-C-H  bond  angles  was 
used  for  those  molecules  with  multiple  H-C-H  bond  angles. 
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Fig.  9  Summary  of  ANOVA  results  for  the  C-C-C-C  dihedral  angles  of  all  molecules  ex¬ 
amined.  Since  there  are  primarily  2  dihedral  angles  in  the  tested  molecules  (60°  and  180°), 
the  absolute  value  of  the  minimum  difference  between  each  dihedral  angle  and  these  2  ref¬ 
erence  angles  was  used.  The  percent  contribution  by  each  of  the  MEAM  parameters  was  de¬ 
termined  by  dividing  the  sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of 
squares  (e.g.,  see  Table  4),  which  was  repeated  for  each  molecule.  The  mean  dihedral  angle  for 
all  C-C-C-C  angles  was  used  for  those  molecules  with  multiple  C-C-C-C  dihedral  angles. 
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Fig.  10  Summary  of  ANOVA  results  for  the  C-C-C-H  dihedral  angles  of  all  molecules  ex¬ 
amined.  Since  there  are  primarily  2  dihedral  angles  in  the  tested  molecules  (60°  and  180°), 
the  absolute  value  of  the  minimum  difference  between  each  dihedral  angle  and  these  2  ref¬ 
erence  angles  was  used.  The  percent  contribution  by  each  of  the  MEAM  parameters  was  de¬ 
termined  by  dividing  the  sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of 
squares  (e.g.,  see  Table  4),  which  was  repeated  for  each  molecule.  The  mean  dihedral  angle  for 
all  C-C-C-H  angles  was  used  for  those  molecules  with  multiple  C-C-C-H  dihedral  angles. 
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Fig.  11  Summary  of  ANOVA  results  for  the  H-C-C-H  dihedral  angles  of  all  molecules  ex¬ 
amined.  Since  there  are  primarily  2  dihedral  angles  in  the  tested  molecules  (60°  and  180°), 
the  absolute  value  of  the  minimum  difference  between  each  dihedral  angle  and  these  2  ref¬ 
erence  angles  was  used.  The  percent  contribution  by  each  of  the  MEAM  parameters  was  de¬ 
termined  by  dividing  the  sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of 
squares  (e.g.,  see  Table  4),  which  was  repeated  for  each  molecule.  The  mean  dihedral  angle  for 
all  H-C-C-H  angles  was  used  for  those  molecules  with  multiple  H-C-C-H  dihedral  angles. 
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Figures  12  and  13  show  the  contribution  of  the  MEAM  parameters  to  the  energy 
vs.  distance  relationships  for  various  molecules  and  paths,  as  outlined  in  Noura- 
nian  et  al.39  These  figures  compare  the  MEAM-computed  energy  vs.  distance  plots 
to  the  same  first  principle  energy  vs.  distance  plots  for  separation  between  various 
molecules  (C2H6,  CH4,  and  H2)  and  atoms  (C-C,  H-H,  C-H).  As  a  measure  of 
how  similar  the  2  curves  are,  the  mean  absolute  error  and  the  root  mean  square 
error  (i.e.,  L|  and  L2  norms,  respectively)  are  used  as  the  similarity/distance  met¬ 
rics  within  the  analysis.  In  general,  the  same  parameters  are  important  for  both 
metrics.  Interestingly,  the  MEAM  parameters  that  are  mainly  driving  these  energy- 
distance  metrics  are  related  to  the  single  element  H  MEAM  potential:  attrac22  (()))), 
asub2  {Ah),  esub2  (. E°H ),  alat2  (. R°H ),  and  bl2  (fi^j j).  For  the  hydrogen  dimers 
energy-distance  relationships,  this  is  to  be  expected,  but  this  also  shows  that  the 
interaction  between  different  molecules  (i.e.,  van  der  Waals’  interaction)  is  largely 
influenced  by  the  H  atoms  on  the  C2H6  and  CH4  molecules.  Last,  the  percent  con¬ 
tribution  of  the  MEAM  parameters  to  the  responses  does  not  always  sum  to  100% 
(i.e.,  Fig.  13);  in  some  cases,  the  residuals  (or  error  term  in  Table  4)  for  the  linear 
regression  model  applied  during  the  ANOVA  analysis  are  a  significant  fraction  of 
the  overall  variability  of  the  responses. 
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Fig.  12  Summary  of  ANOVA  results  for  the  energy  vs.  distance  relationships  for  various 
molecules  and  paths.  In  this  contour  plot,  the  mean  absolute  error  between  the  calculated 
MEAM  curve  and  first  principles  data  at  the  same  distances  was  used  as  a  response  variable. 
The  percent  contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the 
sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4), 
which  was  repeated  for  each  molecule. 
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Fig.  13  Summary  of  ANOVA  results  for  the  energy  vs.  distance  relationships  for  various 
molecules  and  paths.  In  this  contour  plot,  the  root  mean  square  error  between  the  calculated 
MEAM  curve  and  first  principles  data  at  the  same  distances  was  used  as  a  response  variable. 
The  percent  contribution  by  each  of  the  MEAM  parameters  was  determined  by  dividing  the 
sum  of  squares  attributed  to  each  parameter  by  the  total  sum  of  squares  (e.g.,  see  Table  4), 
which  was  repeated  for  each  molecule. 
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In  general,  these  analyses  can  be  used  in  a  few  ways  in  the  potential  development 
process.  First,  Figs.  3-13  can  help  to  evaluate  the  parameter  sensitivity  for  various 
properties  in  the  training  set.  Ultimately  there  will  be  tradeoffs  that  are  encoun¬ 
tered  and  can  be  managed  through  a  multi-objective  optimization  approach,  but  this 
can  help  identify  which  parameters  may  be  involved  with  optimization  of  particular 
properties.  In  some  cases,  depending  on  the  level  of  complexity  of  the  interatomic 
potential,  it  may  be  easier  to  spawn  multiple  suboptimization  problem  formulations 
for  various  properties  rather  than  trying  to  optimize  all  properties  for  all  parameters. 
This  can  also  be  used  for  defining  the  appropriate  parameter  ranges  for  potential  de¬ 
velopment.  While  Figs.  3-13  show  how  important  different  parameters  are  for  dif¬ 
ferent  properties  and  molecules,  it  also  shows  which  properties  do  not  significantly 
influence  properties.  This  analysis  may  provide  guidance  in  terms  of  expanding 
the  range  of  various  parameters  to  assess  whether  the  property  in  question  is  truly 
unaffected  by  the  parameter  or  whether  the  parameter  range  needs  to  be  expanded. 

3.3  Property  Similarity  for  Different  Molecules/Potentials 

The  present  study  contains  properties  for  30  different  molecules  as  a  function  of 
different  combinations  of  MEAM  parameters.  The  similarity  between  the  prop¬ 
erties  of  different  molecules  can  shed  some  light  on  the  uniqueness  of  different 
molecules  for  potential  use  when  constructing  training  sets  for  interatomic  poten¬ 
tials.  In  this  particular  study,  a  Latin  hypercube  sampling  technique  was  used  to 
sample  the  26-dimensional  parameter  space  described  for  the  fractional  factorial 
design.  From  this,  1000  different  MEAM  potentials  were  each  used  to  generate 
the  30  different  molecules  (i.e.,  30,000  molecules)  and  their  corresponding  prop¬ 
erties.  The  Pearson  correlation  coefficient  R  was  used  as  a  metric  for  similarity 
of  the  different  properties  (stemming  from  the  1000  different  MEAM  potentials). 
Figures  14-20  show  the  Pearson  correlation  coefficient  for  different  properties  (en¬ 
ergies,  bond  distances,  and  bond  angles)  when  compared  to  all  other  molecules. 
For  instance,  the  Pearson  correlation  coefficients  of  the  energies  of  the  molecules 
for  the  Latin  hypercube  design  is  shown  in  Fig.  14.  A  high  correlation  coefficient 
indicates  that  the  energy  of  the  2  molecules  on  the  horizontal  and  vertical  axes 
scales  similarly  even  as  the  MEAM  parameters  are  changed  for  the  1000  different 
MEAM  potentials.  For  example,  if  a  new  MEAM  interatomic  potential  is  chosen 
and  the  energy  of  C3H8  (EC:,jjJ  increases  (by  SE{),  the  energy  of  C8Hi8  (ECsh18) 
will  also  increase  (by  5E2 )  in  such  a  manner  that  the  slope  of  the  increases  is  con- 
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stant  and  predictable  (i.e.,  ECsh18  —  ( 8E2/8Ei )  EC3hs )•  While  this  is  perhaps  not 
unexpected  for  the  different  molecules  within  the  alkane  series,  for  instance,  it  is 
interesting  that  there  is  large  correlation  (R  —  1)  between  different  families  of 
molecules  (e.g.,  between  C8H18  and  cycloalkane  cC6H12).  A  low  (or  no)  correla¬ 
tion  indicates  that  the  properties  for  2  molecules  are  not  in  line  with  each  other; 
the  energy  of  C2  does  not  align  with  the  energy  of  H2  (not  surprising  since  differ¬ 
ent  parts  of  the  interatomic  potential  control  these  properties  independent  of  one 
another).  While  this  low  (or  no)  correlation  for  2  molecules  is  in  itself  not  of  high 
value,  the  rows/columns  of  low/no  correlation  can  shed  light  on  the  uniqueness  of 
certain  molecules.  For  instance,  if  H2  dimer  is  not  in  a  training  set  for  the  potential, 
then  one  would  not  expect  any  of  the  alkane,  cycloalkane,  or  polyolefin  molecules 
to  accurately  capture  this  response.  As  has  been  shown  in  Fig.  3,  this  particular  sit¬ 
uation  arises  from  different  potential  parameters  controlling  the  response  of  these 
molecules.  Figure  15  also  shows  the  correlation  coefficient  for  the  same  properties 
for  the  fractional  factorial  design  (i.e.,  64  MEAM  potentials  vs.  the  1000  MEAM 
potentials  in  the  Latin  hypercube  design).  In  general,  the  fractional  factorial  design 
yields  similar  behavior  to  the  more  expansive  coverage  offered  by  the  1000-point 
Latin  hypercube  sampling  strategy,  with  a  fraction  of  the  simulations. 
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Fig.  14  Correlation  coefficient  R  for  the  energies  of  all  molecules  examined  (for  the  Latin 
hypercube  sampling  design).  In  this  plot,  each  response  is  plotted  against  all  of  the  other  re¬ 
sponses.  A  high  correlation  coefficient  indicates  the  strength  of  the  linear  dependence  between 
the  2  responses  (i.e.,  a  high  correlation,  R=l,  for  responses  plotted  against  themselves). 
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Correlation  Coefficient,  R 
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Fig.  15  Correlation  coefficient  R  for  the  energies  of  all  molecules  examined  (for  the  fractional 
factorial  design  case).  In  this  plot,  each  response  is  plotted  against  all  of  the  other  responses. 
A  high  correlation  coefficient  indicates  the  strength  of  the  linear  dependence  between  the  2 
responses  (i.e.,  a  high  correlation,  R=l,  for  responses  plotted  against  themselves). 
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Correlation  Coefficient,  R 


This  correlation  between  properties  is  also  plotted  for  bond  distances  (Figs.  16  and 
17)  and  bond  angles  (Figs.  18-20).  In  terms  of  the  C-H  bond  distance  (Fig.  16),  this 
is  the  most  correlated  properties  between  the  different  molecules  (i.e.,  this  prop¬ 
erty  behaves  similarly  with  respect  to  the  interatomic  potential  no  matter  what  the 
molecule  is,  the  one  exception  being  the  C-H  dimer  bond  distance).  On  the  other 
hand,  the  C-C  bond  distance  (Fig.  17)  is  very  different  in  terms  of  its  correlation 
between  molecules.  This  plot  indicates  that  C2,  cC3H6,  cC4H8,  C2H2,  and  C2H4  are 
very  dissimilar  from  the  remaining  molecules,  which  tend  to  have  C-C  bond  dis¬ 
tances  that  are  similarly  affected  by  interatomic  potential.  This  is  perhaps  expected; 
the  dimers  (C2  in  this  case)  tend  to  have  very  different  properties  than  the  other 
molecules,  the  cycloalkanes  with  C  less  than  4  (cC3H6,  cC4H8)  have  constrained 
bond  angles  that  can  alter  their  properties,  and  the  alkene/alkyne  molecules  (C2H2, 
C2H4)  have  a  very  different  bonding  state  between  the  C  atoms  due  to  the  absence 
of  H  atoms.  Interestingly,  comparing  Fig.  17  with  parameter  plot  in  Fig.  5,  one  no¬ 
tices  subtle  changes  in  the  percent  contribution  of  the  different  MEAM  parameters 
to  the  C-C  bond  distance  of  these  molecules;  this  translates  into  very  noticeable  de¬ 
viations  from  perfect  correlation  in  Fig.  17.  Again,  this  would  be  as  expected.  For 
instance,  while  asubl  (Ac)  affects  the  C-C  distance  of  these  molecules,  this  param¬ 
eter  has  little-to-no  effect  on  the  other  molecules.  If  held  fixed,  the  correlation  will 
be  higher;  if  allowed  to  change,  this  will  affect  the  properties  of  only  a  select  group 
of  molecules,  resulting  in  a  lower  correlation  as  observed  herein. 
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Fig.  16  Correlation  coefficient  R  for  the  C-H  bond  distances  of  all  molecules  examined.  The 
mean  bond  distance  for  all  C-H  bonds  was  used  for  those  molecules  with  multiple  C-H  bonds. 
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Fig.  17  Correlation  coefficient  R  for  the  C-C  bond  distances  of  all  molecules  examined.  The 
mean  bond  distance  for  all  C-C  bonds  was  used  for  those  molecules  with  multiple  C-C  bonds. 
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Figures  18-20  show  the  correlation  between  molecules  with  respect  to  the  C-C-C, 
C-C-H,  and  H-C-H  bond  angles,  respectively.  In  these  cases,  there  are  definitive 
dissimilarities  between  different  molecules.  For  instance,  the  cycloalkanes  are  dis¬ 
similar  from  other  molecules  in  terms  of  the  C-C-C  bond  angles,  but  have  varying 
degrees  of  similarity  in  terms  of  the  C-C-H  bond  angles  and  H-C-H  bond  angles. 
Similar  observations  can  be  made  about  some  of  the  other  molecules  and  families 
of  molecules.  In  part,  the  dissimilarity  between  molecules,  particularly  present  in 
Fig.  19  for  C-C-H  bond  angles,  can  also  be  driven  by  the  calculation  of  the  bond 
angle  itself.  The  bond  angles  represented  in  this  work  are  the  result  of  taking  all 
bond  angles  of  a  particular  type  and  averaging  the  values,  thus  different  local  en¬ 
vironments  (e.g.,  head  or  tail  group  of  a  linear  alkane  molecule)  can  influence  the 
average  bond  angle.  For  example,  even  a  simple  molecule  such  as  the  C6Hi4  alkane 
molecule  has  19  bonds,  36  bond  angles,  and  45  dihedral  angles,  which  are  broken 
down  in  the  following  manner:  5  C-C  bonds,  14  C-H  bonds,  4  C-C-C  bond  angles, 
22  C-C-H  bond  angles,  10  H-C-H  bond  angles,  3  C-C-C-C  dihedral  angles,  18 
C-C-C-H  dihedral  angles,  and  24  H-C-C-H  dihedral  angles.  Of  the  22  C-C-H 
bond  angles,  most  bond  angles  are  within  0.2°  of  each  other  ( « 109.7°)  except  for  2 
bond  angles  at  the  head  and  tail  of  the  molecule  that  have  bond  angles  of  113.28°. 
However,  it  was  necessary  to  reduce  the  dimension  of  this  data  in  some  manner 
and  the  average  response  was  deemed  an  appropriate  measure  for  the  present  study 
(e.g.,  1000  potentials,  1  C8H18  molecule,  30  different  C-C-H  bond  angles). 
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Fig.  18  Correlation  coefficient  R  for  the  C-C-C  bond  angles  of  all  molecules  examined.  The 
mean  bond  angle  for  all  C-C-C  bond  angles  was  used  for  those  molecules  with  multiple  C-C-C 
bond  angles. 
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Fig.  19  Correlation  coefficient  R  for  the  C-C-H  bond  angles  of  all  molecules  examined. 
The  mean  bond  angle  for  all  C-C-H  bond  angles  was  used  for  those  molecules  with  multi¬ 
ple  C-C-H  bond  angles. 
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Fig.  20  Correlation  coefficient  R  for  the  H-C-H  bond  angles  of  all  molecules  examined. 
The  mean  bond  angle  for  all  H-C-H  bond  angles  was  used  for  those  molecules  with  multiple 
H-C-H  bond  angles. 
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In  general,  the  Latin  hypercube  sampling  provides  more  expansive  coverage  of  the 
interatomic  potential  design  space  in  contrast  to  the  fractional  factorial  design  pre¬ 
sented.  However,  the  fractional  factorial  methods  were  found  to  give  many  of  the 
same  trends  in  terms  of  parameters.  The  preceding  analysis  can  be  used  to  better 
understand  the  types  of  molecules  or  tests  that  should  be  included  in  the  training 
set  as  well  as  which  ones  are  duplicates  (i.e.,  correlated).  Furthermore,  these  tests 
may  provide  insight  as  to  how  to  weigh  different  properties.  For  instance,  Fig.  14 
shows  that  many  of  the  energies  of  molecules  are  correlated.  This  has  several  im¬ 
plications  in  terms  of  weighting  the  molecule  atomization  energies.  First,  if  all  are 
equally  weighted  during  multi-objective  optimization  of  the  interatomic  potential 
and  most  molecules  are  correlated,  then  clearly  the  potential  will  be  biased  towards 
capturing  the  energies  of  the  correlated  molecules  rather  than  the  noncorrelated 
molecules.  Second,  if  the  weighting  is  biased  towards  the  correlated/noncorrelated 
molecules,  then  the  potential  may  not  be  as  transferable  to  other  molecules  that 
were  not  weighted  as  highly  during  its  formulation.  Third,  if  correlations  exist,  this 
can  be  taken  advantage  of  in  the  training  set  in  terms  of  reducing  the  computational 
cost.  That  is,  the  energy  of  more  complex  molecule,  which  might  take  much  longer 
time  to  minimize  their  structure,  can  be  reformulated  as  functions  of  less  complex 
molecules  and  included  in  the  optimization  process  without  running  a  simulation 
(i.e.,  a  surrogate  model  based  on  a  high  degree  of  correlation  with  less  expensive 
properties).  This  may  have  broad  implications.  A  predictive  surrogate  model  for 
melting  temperature  based  on  MEAM  parameters  and  less  expensive  properties 
could  be  used  in  the  optimization  process  itself,  rather  than  just  as  a  validation 
test  afterwards.  These  notions  and  others  can  be  inferred  from  some  of  the  studies 
presented  within. 

4.  Summary  and  Conclusion 

An  interatomic  potential  for  saturated  hydrocarbons39  using  MEAM,  a  semiempir- 
ical  many-body  potential  based  on  density  functional  theory,  was  parameterized  to 
the  bond  distances,  bond  angles,  and  atomization  energies  at  0  K  of  a  series  of 
alkane  structures  from  methane  to  n-octane.  In  this  work,  the  parameters  of  the 
MEAM  potential  were  explored  through  a  design  of  experiments  and  Latin  hyper¬ 
cube  sampling  approach  to  better  understand  how  individual  MEAM  parameters  af¬ 
fected  several  properties  of  molecules  (energy,  bond  distances,  bond  angles,  and  di¬ 
hedral  angles)  and  the  relationship/correlation  between  various  molecules  in  terms 
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of  these  properties.  Beyond  these  results  of  sensitivity  of  MEAM  parameters  and  the 
correlation  between  different  properties,  it  was  also  found  that  reduced-order  frac¬ 
tional  factorial  design  of  experiment  with  64  MEAM  combinations  yielded  similar 
results  to  the  Latin  hypercube  sampling  design  with  1000  MEAM  combinations,  for 
instance.  The  methodology  can  be  easily  extended  to  other  potential  formulations 
and  can  be  useful  for  understanding  parameter  effects,  understanding  similarities 
between  properties,  and  constructing  training  set  tests  for  the  interatomic  potential 
development  process. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


INN 

first  nearest-neighbor 

2NN 

second  nearest-neighbor 

ANOVA 

analysis  of  variance 

BCC 

body-centered  cubic 

COMB 

charge-optimized  many-body 

COMPASS 

condensed-phase  optimized  molecular  potentials  for  atomistic 

simulation  studies 

EAM 

embedded- atom  method 

FCC 

face-centered  cubic 

HCP 

hexagonal  close-packed 

MEAM 

modified  embedded-atom  method 

PVT 

pressure- volume-temperature 

ReaxFF 

reactive  force  field 

REBO 

reactive  empirical  bond  order 

UEOS 

universal  equation  of  state 
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